Skip to content
This repository was archived by the owner on Nov 15, 2023. It is now read-only.

Conversation

@cheme
Copy link
Contributor

@cheme cheme commented Aug 7, 2020

Light client Cht pruning on polkadot is erasing headers that babe requires during its epoch pruning.

The header cache allows thing to work, except if the node get shutdown and restarted, then the db misses some header and the chain get bricked.

This PR create a SharedPruningRequirements struct that is designed to be passed around client component and to contain db pruning relative constraints.

Then the CHT pruning (when babe set need_mapping_for_light_pruning of the shared pruning requirement to true) will not
prune the number to key lookup for the canonical header.
That is babe that later will prune it through its fork tree epoch pruning.

For this PR the only constraint is a finalized block height to keep untouched, it is set by babe depending on its stored epochs and locks the cht pruning of the light client headers.

This means that in babe we update this info after epoch pruning, and in the light client limit cht pruning and when limit applies we need to buffered the pending pruning ranges.

Note that someone with better grasp of ForkTree and babe EpochChange may be able to do thin the needed_parent_relation range.

At first I tried to do things in a simplier/quickier way, but none of my attempt where good, but there may be a more straight forward solution that do not rely on asumptions (would be better that this pr that adds code).

Failed previous attempts:

- use default missing header to not being descendant of the node, or other default behavior for missing node.
  -> does not work well, actually break fork tree.
- keep a treshold amount of block for cht pruning (but it does not really work as we do not really know how long before next babe pruning since finalisation can stall).
- do babe pruning more often and before cht pruning.
  -> does not work well with polkadot where slot len (2400) > pruning 

@cheme cheme added A3-in_progress Pull request is in progress. No review needed at this stage. B3-apinoteworthy labels Aug 7, 2020
@cheme cheme requested a review from andresilva as a code owner August 7, 2020 19:52
Comment on lines 540 to 551
#[derive(Clone)]
/// Pruning requirement to share between multiple client component.
///
/// This allows pruning related synchronisation. For instance in light
/// client we need to synchronize header pruning from CHT (every N blocks)
/// with the pruning from consensus used (babe for instance require that
/// its epoch headers are not pruned which works as long as the slot length
/// is less than the CHT pruning window.
/// Each compenent register at a given index (call are done by this order).
///
/// Note that this struct could be split in two different struct (provider without
/// component and Component), depending on future usage of this shared info.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Clone)]
/// Pruning requirement to share between multiple client component.
///
/// This allows pruning related synchronisation. For instance in light
/// client we need to synchronize header pruning from CHT (every N blocks)
/// with the pruning from consensus used (babe for instance require that
/// its epoch headers are not pruned which works as long as the slot length
/// is less than the CHT pruning window.
/// Each compenent register at a given index (call are done by this order).
///
/// Note that this struct could be split in two different struct (provider without
/// component and Component), depending on future usage of this shared info.
/// Pruning requirement to share between multiple client component.
///
/// This allows pruning related synchronisation. For instance in light
/// client we need to synchronize header pruning from CHT (every N blocks)
/// with the pruning from consensus used (babe for instance require that
/// its epoch headers are not pruned which works as long as the slot length
/// is less than the CHT pruning window.
/// Each compenent register at a given index (call are done by this order).
///
/// Note that this struct could be split in two different struct (provider without
/// component and Component), depending on future usage of this shared info.
#[derive(Clone)]

Comment on lines 673 to 674
#[derive(Eq, PartialEq)]
/// Define a block number limit to apply.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[derive(Eq, PartialEq)]
/// Define a block number limit to apply.
/// Define a block number limit to apply.
#[derive(Eq, PartialEq)]

@cheme cheme added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 17, 2020
@cheme
Copy link
Contributor Author

cheme commented Aug 17, 2020

I did simplify code a bit, I can still use traits to make instantiation a bit more safer, I could also drop support for other source of constraint and have a non generic code.
Also maybe we do not need to keep multiple range of pending finalize but only a single aggregated range.
But at this point I am still wondering if there might be some better way to fix that (avoid babe querying pruned headers) cc\ @svyatonik ?

pub enum PruningLimit<N> {
/// Ignore.
None,
/// The component require at least this number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iiuc from the code, N here is actually number of oldest block that shouldn't be pruned. If I'm right, then the comment looks wrong - i.e. some component may need last 1024 headers, but we may be at block 1_000_000 => N would actually be 1_000_000 - 1024

cache: Arc<DbCacheSync<Block>>,
header_metadata_cache: Arc<HeaderMetadataCache<Block>>,
shared_pruning_requirements: SharedPruningRequirementsSource<Block>,
pending_cht_pruning: RwLock<VecDeque<(NumberFor<Block>, NumberFor<Block>)>>,
Copy link
Contributor

@svyatonik svyatonik Aug 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we may only store number of the oldest+newest block we want to prune. I.e. instead of vec![(1, 2048), (2049, 4096), (4097, 5120), ..., (5121, 8192)] we may only store two numbers - 1 and 8192 :)

@svyatonik
Copy link
Contributor

Re simpler option - I haven't finished reviewing yet + I only have shallow knowledge of epochs + pruning. But would it be enough to know if header H (associated with epoch E) is canonical/non-canonical to determine if we need to prune the entry E in the fork tree? If so, we may tune header pruning a bit so that number => hash won't be pruned for canonical headers - i.e. change remove_key_mappings call to this.

One last addition - iiuc the BABE fails to import block if pruning fails, right? How critical is pruning for BABE? If it isn't critical, then probably it is better just to log the issue, not to fail the process?

@cheme
Copy link
Contributor Author

cheme commented Aug 17, 2020

Re simpler option - I haven't finished reviewing yet + I only have shallow knowledge of epochs + pruning. But would it be enough to know if header H (associated with epoch E) is canonical/non-canonical to determine if we need to prune the entry E in the fork tree?

I wonder if knowing if H is canonical/non-canonical is actually the operation in fork tree that require a to read cht pruned headers (basically it fails when looking for common ancestor of two epoch fork tree roots).
IIIRC the missing header occurs at

child.number < *number && is_descendent_of(&child.hash, hash).unwrap_or(false))
on 'is_descendant' call.

If so, we may tune header pruning a bit so that number => hash won't be pruned for canonical headers - i.e. change remove_key_mappings call to this.

I am not sure I understand this correctly, but yes I feel like there may be something to do in babe to avoid querying old headers, but my attempts at doing so were not very successful.

BABE fails to import block if pruning fails, right? How critical is pruning for BABE? If it isn't critical, then probably it is better just to log the issue, not to fail the process?

Seems like a good idea to me.
It still means that babe will never get pruned but it is probably better to be able to use the light client for a while before resynching it.
In fact if we don't exit the light client during synch, the issue never occurs thanks to the headers cache.

cheme added 2 commits August 17, 2020 15:12
This reverts commit bf40274.

Allowing failure on a PR that try to avoid failure seems awkward.
@svyatonik
Copy link
Contributor

My idea is to prune all fork tree nodes that correspond to blocks with number < best_finalized_number and that are non-canonical (this is why we need to leave number => hash mapping in light storage after pruning header). Their children must be pruned too. Maybe some new ForkTree method? ForkTree::iterate_prune(predicate)?

So basically you start with roots - if some roots have number < last_finalized and they're non-canonical (doesn't require header in storage), then you prune it + its children. Then you proceed to selected root children && only leave canonical again. Continue until you'll find that all roots must be kept. Does it makes sense?

@cheme
Copy link
Contributor Author

cheme commented Aug 18, 2020

I see, keep a KeyLookup mapping to resolve the fork tree pruning (remove non-canonical roots without having to calculate a common parent through headers).
Then the pruning can use the fork tree information to remove those lookups (through fork tree children relation).
Here we can code the thing without a 'babe-light_db' dependency from a babe perspective : you only need to let babe be aware it needs to delete mappings on pruning which is a simple configuration to use properly when running in light client.
From 'light-db' (cht pruning) perspective you also need a configuration to skip pruning lookup (from number key), this configuration depends on the consensus used, so we still need some SharedPruningRequirement, but it does not have to share memory and is only use when instantiating in light client (to ensure correct configuration is use).

So in this case SharedPruningRequirement just say: one of earlier component ('babe') did take responsability to prune non-canonical 'from number' key lookup so CHT do not need it.

If keeping shared memory we could also simply store those fork tree roots in the SharedPruningRequirement and calculate their new state before cht pruning. But that is only too optimize the db footprint and is probably not worth it.

I will try the first solution.

@cheme cheme added A3-in_progress Pull request is in progress. No review needed at this stage. and removed A0-please_review Pull request needs code review. labels Aug 18, 2020
@svyatonik
Copy link
Contributor

From 'light-db' (cht pruning) perspective you also need a configuration to skip pruning lookup (from number key), this configuration depends on the consensus used, so we still need some SharedPruningRequirement, but it does not have to share memory and is only use when instantiating in light client (to ensure correct configuration is use).

Probably better to leave this info in light-db forever - it'll be easier (unless it breaks anything else). Maybe optimization for follow-up PRs? Not sure? But I'm not insisting :)

@cheme
Copy link
Contributor Author

cheme commented Aug 18, 2020

seems like synch it is working ok with new code (the number of key lookup sometime peeks a lot but seems to get down close to the number of headers sometimes.
Code will need some refactoring before being reviewable again.

@cheme cheme requested a review from mxinden as a code owner August 19, 2020 10:26
@cheme cheme added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 19, 2020
@gnunicorn gnunicorn added A1-onice and removed A0-please_review Pull request needs code review. labels Sep 9, 2020
@cheme cheme added the C1-low PR touches the given topic and has a low impact on builders. label Sep 9, 2020
@gnunicorn
Copy link
Contributor

Closed due to it being stale for while.

@gnunicorn gnunicorn closed this Oct 19, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

C1-low PR touches the given topic and has a low impact on builders.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants